Data was split into 3 files (joining)
All non relevant symbols in colnames were removed
2 files are created: clean and clean_binary
Linear regression of each variable
group variables to nest
mutate formula by mapping
mapping tidy function
unnest model to extract statistics
add significance column based on q value
PCA performed on numeric, scaled data
PC1 + PC2 plotted and colored based on diagnosis
Nice clustering of the two diagnosis
Plot showing variance explained by each PC
more than 40% explained by PC1
Workflow of supervised classifier
Forest plot